14. Coding: Scoring Min & Max Levels

## Starting to Code Minimax

Now that we have a complete game class, and the lectures have introduced the core concept of minimax while walking through the basic procedure in the quizzes, the next step is to do it all with code. In this exercise you're going to implement the two helper functions required by the minimax algorithm.

The Minimax Algorithm:

Recall the minimax algorithm originally presented at the beginning of this lesson (shown below). We will start by coding helper functions: a terminal state test, the min-value function, and the max-value function. Notice that the min-value() and max-value() functions only need to return a single number -- a "utility value" -- which is the score of the current branch; the function for selecting a move based on those scores will come in a later quiz.

Simplifying Assumptions:

We will also make two simplifying assumptions in order to adhere to the conventions of Thad's quizzes:

  • Assumption 1: a state is terminal if the active player has no remaining moves
  • Assumption 2: the board utility is -1 if it terminates at a max level, and +1 if it terminates at a min level

The first assumption is only required to support the second assumption. In general, we can determine that a game is terminal if either player has no remaining moves, but that would require terminal nodes at both min and max levels to support returning +1 or -1 depending on which player is the winner. Restricting the terminal condition to the active player means that there is only one possible return value at min or max nodes, but the consequence is that the search will sometimes go one layer deeper in the tree than absolutely necessary.

The second assumption is specified in the lecture quizzes. Technically, any pair of values can be used to indicate wins and losses so long as they admit an ordering such that the score for winning is greater than the score for losing; e.g., instead of -1 & +1 you could you use -π & π/2, or 100.99 & 101.0, or -∞ & +∞. It is common to use -∞ & +∞ when a heuristic function is used (which we'll do in another project for this module) because every possible heuristic estimate lies between those bounds, which avoids a possible bug where your agent prefers a non-terminal state scored by the heuristic to a terminal state with a "true" win or loss score.

Important Note: Notice that the value does not depend on which player is "active" on the board. A win for the searching player (the player that initiated the search from the root of the game tree) is always worth +1 and a loss is always worth -1. One very common mistake is to "flip" the utility between min and max nodes, but the score should be relative to the desirability of the outcome for the searching player.

Start Quiz:


def terminal_test(gameState):
    """ Return True if the game is over for the active player
    and False otherwise.
    """
    # TODO: finish this function!
    pass


def min_value(gameState):
    """ Return the value for a win (+1) if the game is over,
    otherwise return the minimum value over all legal child
    nodes.
    """
    # TODO: finish this function!
    pass


def max_value(gameState):
    """ Return the value for a loss (-1) if the game is over,
    otherwise return the maximum value over all legal child
    nodes.
    """
    # TODO: finish this function!
    pass
# NOTE: This function is provided for reference; it is not imported or used
# by the test code when you submit (this allows for dependency isolation
# in the test cases)

from copy import deepcopy

xlim, ylim = 3, 2  # board dimensions

class GameState:
    """
    Attributes
    ----------
    _board: list(list)
        Represent the board with a 2d array _board[x][y]
        where open spaces are 0 and closed spaces are 1
    
    _parity: bool
        Keep track of active player initiative (which
        player has control to move) where 0 indicates that
        player one has initiative and 1 indicates player 2
    
    _player_locations: list(tuple)
        Keep track of the current location of each player
        on the board where position is encoded by the
        board indices of their last move, e.g., [(0, 0), (1, 0)]
        means player 1 is at (0, 0) and player 2 is at (1, 0)
    
    """

    def __init__(self):
        self._board = [[0] * ylim for _ in range(xlim)]
        self._board[-1][-1] = 1  # block lower-right corner
        self._parity = 0
        self._player_locations = [None, None]

    def forecast_move(self, move):
        """ Return a new board object with the specified move
        applied to the current game state.
        
        Parameters
        ----------
        move: tuple
            The target position for the active player's next move
        """
        if move not in self.get_legal_moves():
            raise RuntimeError("Attempted forecast of illegal move")
        newBoard = deepcopy(self)
        newBoard._board[move[0]][move[1]] = 1
        newBoard._player_locations[self._parity] = move
        newBoard._parity ^= 1
        return newBoard

    def get_legal_moves(self):
        """ Return a list of all legal moves available to the
        active player.  Each player should get a list of all
        empty spaces on the board on their first move, and
        otherwise they should get a list of all open spaces
        in a straight line along any row, column or diagonal
        from their current position. (Players CANNOT move
        through obstacles or blocked squares.) Moves should
        be a pair of integers in (column, row) order specifying
        the zero-indexed coordinates on the board.
        """
        loc = self._player_locations[self._parity]
        if not loc:
            return self._get_blank_spaces()
        moves = []
        rays = [(1, 0), (1, -1), (0, -1), (-1, -1),
                (-1, 0), (-1, 1), (0, 1), (1, 1)]
        for dx, dy in rays:
            _x, _y = loc
            while 0 <= _x + dx < xlim and 0 <= _y + dy < ylim:
                _x, _y = _x + dx, _y + dy
                if self._board[_x][_y]:
                    break
                moves.append((_x, _y))
        return moves

    def _get_blank_spaces(self):
        """ Return a list of blank spaces on the board."""
        return [(x, y) for y in range(ylim) for x in range(xlim)
                if self._board[x][y] == 0]

import minimax_helpers

from gamestate import *

g = GameState()

print("Calling min_value on an empty board...")
v = minimax_helpers.min_value(g)

if v == -1:
    print("min_value() returned the expected score!")
else:
    print("Uh oh! min_value() did not return the expected score.")

def terminal_test(gameState):
    """ Return True if the game is over for the active player
    and False otherwise.
    """
    return not bool(gameState.get_legal_moves())  # by Assumption 1


def min_value(gameState):
    """ Return the value for a win (+1) if the game is over,
    otherwise return the minimum value over all legal child
    nodes.
    """
    if terminal_test(gameState):
        return 1  # by Assumption 2
    v = float("inf")
    for m in gameState.get_legal_moves():
        v = min(v, max_value(gameState.forecast_move(m)))
    return v


def max_value(gameState):
    """ Return the value for a loss (-1) if the game is over,
    otherwise return the maximum value over all legal child
    nodes.
    """
    if terminal_test(gameState):
        return -1  # by assumption 2
    v = float("-inf")
    for m in gameState.get_legal_moves():
        v = max(v, min_value(gameState.forecast_move(m)))
    return v